Causal Relationships in the Quantitative Social Sciences II

PSCI 3300.002 Political Science Research Methods

A. Jordan Nafa

University of North Texas

2/16/23

Overview

  • Formalizing the logic of cause and effect

  • Formal and informal representations of theories of politics

    • Connecting research question, data, and theory gives us substance
  • Problem Set I posted later today and will be due on February 28th

\[ \definecolor{treat}{RGB}{27,208,213} \definecolor{outcome}{RGB}{98,252,107} \definecolor{baseconf}{RGB}{244,199,58} \definecolor{covariates}{RGB}{178,26,1} \definecolor{index}{RGB}{37,236,167} \definecolor{timeid}{RGB}{244,101,22} \definecolor{mu}{RGB}{71,119,239} \definecolor{sigma}{RGB}{219,58,7} \newcommand{normalcolor}{\color{white}} \newcommand{treat}[1]{\color{treat} #1 \normalcolor} \newcommand{resp}[1]{\color{outcome} #1 \normalcolor} \newcommand{sample}[1]{\color{baseconf} #1 \normalcolor} \newcommand{covar}[1]{\color{covariates} #1 \normalcolor} \newcommand{obs}[1]{\color{index} #1 \normalcolor} \newcommand{tim}[1]{\color{timeid} #1 \normalcolor} \newcommand{mean}[1]{\color{mu} #1 \normalcolor} \newcommand{vari}[1]{\color{sigma} #1 \normalcolor} \]

Review of Key Terms

  • Correlation: A measure of the extent to which two or more variables tend to occur together.

    • If two variables \(\treat{X}\) and \(\resp{Y}\) tend to occur together or increase at the same rate, we would say they are positively correlated

    • If the occurrence of \(\treat{X}\) unrelated to \(\resp{Y}\), we would say these two variables are uncorrelated

    • If when \(\treat{X}\) occurs we are less likely to observe \(\resp{Y}\), we would say these two variables are negatively correlated

  • Line of Best Fit: A line that minimizes how far data points are from the line on average, based on some measure of distance from data to the line

Linear Correlation Example

We can estimate this correlation in R with a simple linear model and data from the Varieties of Democracy project’s {vdemdata} package

# Get the data we need from the vdemdata package
vdem_df <- vdemdata::vdem %>% 
  # We'll use just the year 2018 here for simplicity
  filter(year == 2018) %>%  
  # Transmute a subset of the data for plotting
  transmute(
    country_name, 
    polyarchy = v2x_polyarchy*10, 
    gender_equality = v2x_gender*10
    )

# Estimate the linear relationship
lm_democ_gender <- lm(polyarchy ~ gender_equality, data = vdem_df)

# Print a summary of the result
broom::tidy(lm_democ_gender)
# A tibble: 2 × 5
  term            estimate std.error statistic  p.value
  <chr>              <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)        -3.03    0.486      -6.24 3.14e- 9
2 gender_equality     1.12    0.0642     17.5  2.35e-40

Linear Correlation Example

Review of Key Terms

  • Mean \(\mean{\mu}_{\treat{x}}\): \(\frac{\sum_{\obs{i}=\obs{1}}^{\sample{n}} \treat{x}_{\obs{i}}}{\sample{N}}\)

    • The average value of a numeric variable

    • Deviation from the mean describes the distance between an observation’s value for some variable relative to its mean

  • Variance \(\vari{\sigma}_{\treat{x}}^{2}\): \(\frac{\sum_{\obs{i}=\obs{1}}^{\sample{n}} (\treat{x}_{\obs{i}} - \mean{\mu}_{\treat{x}})^{2}}{\sample{N}}\)

    • A measure of how much variation a variable exhibits. It is the average of the square of the deviations from the mean

Review of Key Terms

  • Standard Deviation \(\vari{\sigma}_{\treat{x}}\): \(\sqrt{\vari{\sigma}_{\treat{x}}^{2}}\)

    • Another measure of the variability in some variable. It is the square root of the variance and has the advantage of being on roughly the same scale as the the variable itself
  • Covariance \(Cov_{\treat{x}, \resp{y}} = \frac{\sum_{\obs{i}=\obs{1}}^{\sample{n}} (\treat{x}_{\obs{i}} - \mean{\mu}_{\treat{x}})(\resp{y}_{\obs{i}} - \mean{\mu}_{\resp{y}})}{\sample{N}}\)

    • A measure of the correlation between two variables. It is calculated as the average of the product of the deviations from the mean
  • Correlation Coefficient \(\rho_{\treat{x}, \resp{y}} = \frac{Cov_{\treat{x}, \resp{y}}}{\vari{\sigma}_{\treat{x}} \cdot \vari{\sigma}_{\resp{y}}}\)

    • Another measure of the (linear) correlation between two variables. It is the obtained by dividing the covariance by the product of the standard deviations

Causal Questions

  • How does \(\treat{X}\) influence \(\resp{Y}\)?

    • Good research questions are broad but also precise

    • Your research question should clearly state your treatment and outcome of interest

  • How does the prospect of inclusion for historically dispossessed social groups influence voter turnout?

  • Do state-level policies restricting abortion care adversely impact access to contraception?

  • How does the onset of conflict impact a country’s economic growth?

  • Does American military presence abroad prevent political instability?

Defining Causal Effects

  • Potential Outcomes formally encode counterfactuals

    • \(\resp{Y}_{\obs{i}}(\treat{X} = 1)\) is the outcome we would observe if unit \(i\) was treated

    • \(\resp{Y}_{\obs{i}}(\treat{X} = 0)\) is the outcome we would observe if unit \(i\) was untreated

  • Consistency assumption connects observed outcomes to potential outcomes

    • \(\resp{Y}_{\obs{i}} = \resp{Y}_{\obs{i}}(\treat{X}_{\obs{i}})\) means we observe the potential outcome for the observed treatment
  • Causal effect for unit \(i\) is \(\resp{Y}_{\obs{i}}(\treat{X} = 1) - \resp{Y}_{\obs{i}}(\treat{X} = 0)\)

    • Difference in the observed and potential outcomes

Fundamental Problem of Causal Inference

  • For each individual \(\obs{i}\) we can only observe \(\treat{X}_{\obs{i}} = 1\) or \(\treat{X}_{\obs{i}} = 0\)

  • It is logically impossible to directly observe \(\resp{Y}_{\obs{i}}(\treat{X} = 1) - \resp{Y}_{\obs{i}}(\treat{X} = 0)\)

  • We can also generalize this to non-binary treatments (i.e., categorical, continuous, multivariate)

Key Assumptions

  • Causal Ordering: \(\treat{X}_{\obs{i}} \longrightarrow \resp{Y}_{\obs{i}}\)

    • No reverse causality or simultaneity
  • Consistency: \(\resp{Y}_{\obs{i}} = \resp{Y}_{\obs{i}}(\treat{x})\) if \(\treat{X}_{\obs{i}} = \treat{x}\)

    • There are no hidden versions of the treatment or treatment variance is irrelevant
  • No Interference between Units: \(\resp{Y}_{\obs{i}}(\treat{X}_{1}, \treat{X}_{2}, \dots, \treat{X}_{\sample{n}}) = \resp{Y}_{\obs{i}}(\treat{X}_{\obs{i}})\)

    • No causal effect of other units’ treatment on other units’ outcomes
  • Consistency + No Interference = SUTVA (stable unit-treatment variation assumption)

Manipulation

  • \(\resp{Y}_{\obs{i}}(\treat{x})\) is the value that \(\resp{Y}\) would take under \(\treat{X}_{\obs{i}}\) set to \(\treat{x}\)

    • Common to use uppercase to denote a theoretical quantity and lowercase to denote a realization
  • In principle, for the causal effect of \(\treat{X}_{\obs{i}}\) to be properly defined, the treatment should be manipulable at least in theory

    • There can be no causation without manipulation
  • This can be complicated for certain immutable characteristics such as race, sex, etc.

    • One option is to focus on places where we could, at least in theory, manipulate these characteristics

Estimands

  • In a perfect world, we could estimate observation-level causal effects

    • \(\resp{Y}_{\obs{i}}(\treat{X}_{\obs{i}} = 1) - \resp{Y}_{\obs{i}}(\treat{X}_{\obs{i}} = 0)\)

    • In practice this requires strong assumptions and is virtually impossible to identify

  • Instead, we usually attempt to estimate the average causal effect (ACE) in some subset of the population

  • Our estimand connects theory to data

  • Every study must be able to answer the question “What is your estimand?” (Lundberg, Johnson, and Stewart 2021)

Estimands

  • Sample Average Treatment Effect (SATE)

    \[ \mathrm{SATE} = \frac{1}{\sample{n}}\sum_{\obs{i}=1}^{\sample{n}}[\resp{Y}_{\obs{i}}(\treat{X}_{\obs{i}} = 1) - \resp{Y}_{\obs{i}}(\treat{X}_{\obs{i}} = 0)]\\ = \frac{\sum_{\obs{i}=1}^{\sample{n}}[\resp{Y}_{\obs{i}}(\treat{X}_{\obs{i}} = 1) - \resp{Y}_{\obs{i}}(\treat{X}_{\obs{i}} = 0)]}{\sample{n}} \]

    • Average of the outcomes if everyone is treated versus if no one is treated

    • Answers the question what “would happen if everyone received the treatment compared to no one receiving it?”

Estimands

  • Sample Average Treatment Effect for the Treated (SATT)

    \[ \mathrm{SATT} = \frac{1}{\sample{n}_{1}}\sum_{\obs{i}=1}^{\sample{n}}\treat{X}_{\obs{i}}[\resp{Y}_{\obs{i}}(\treat{X}_{\obs{i}} = 1) - \resp{Y}_{\obs{i}}(\treat{X}_{\obs{i}} = 0)]\\ = \frac{1}{\sample{n}_{1}}\sum_{\obs{i}=1}^{\sample{n}}\treat{X}_{\obs{i}}[\resp{Y}_{\obs{i}} - \resp{Y}_{\obs{i}}(\treat{X}_{\obs{i}} = 0)]\\ = \frac{\sum_{\obs{i}=1}^{\sample{n}}[\resp{Y}_{\obs{i}} - \resp{Y}_{\obs{i}}(\treat{X}_{\obs{i}} = 0)]}{\sample{n}_{1}} \]

  • Answers the question “What is the average effect of the treatment among those who received treatment?”

Population versus Sample Estimands

  • SATE and SATT are specific to a given study

    • Theoretically, these estimands are well defined without the kind of magical thinking people usually rely on for inference about a population (e.g., repeated sampling)

    • They are finite sample or finite population estimands

  • What if we want to make inferences about a larger population though?

    • Assume units are a random sample from a large/infinite population and hedge our bets on the central limit theorem

    • In many areas of political science and international relations, this is not a logically defensible assumption

Population Estimands

  • Population Average Treatment Effects (PATE)

    \[ \mathrm{PATE} = \mathrm{E}[\resp{Y}_{\obs{i}}(\treat{X}_{\obs{i}} = 1) - \resp{Y}_{\obs{i}}(\treat{X}_{\obs{i}} = 0)] \]

  • Population Average Treatment Effect for the Treated (PATT)

    \[ \mathrm{PATT} = \mathrm{E}[\resp{Y}_{\obs{i}}(\treat{X}_{\obs{i}} = 1) - \resp{Y}_{\obs{i}}(\treat{X}_{\obs{i}} = 0) | \treat{X}_{\obs{i}} = 1] \]

  • In the case of a non-random sample from an apparent population where repeated sampling is logically impossible, \(\mathrm{PATE} = \mathrm{SATE}\) and \(\mathrm{PATT} = \mathrm{SATT}\)

Other Interesting Estimands

  • Conditional Average Treatment Effect (CATE)

    \[ \mathrm{CATE} = \mathrm{E}[\resp{Y}_{\obs{i}}(\treat{X}_{\obs{i}} = 1) - \resp{Y}_{\obs{i}}(\treat{X}_{\obs{i}} = 0) | \treat{X}_{\obs{i}} = \treat{x}] \]

    • We can use this to detect heterogenous effects for theory testing or targeting
  • We will discuss additional estimands for direct and indirect causal effects later in the course

Summary

  • Causal inference is about counterfactuals

  • Potential outcomes constitute a way of formally expressing counterfactual relationships

    • There are many possible estimands of interest–any possible difference between potential outcomes
  • Estimands connect research question, theory, and data

    • A quantitative study that cannot answer the question “What is your estimand?” is effectively useless
  • Goal of contemporary political science is to provide vague answers to precise questions

Where We’re Headed

  • Theoretical relationships as Bayesian networks

    • Types of bias and directed acyclic graphs

    • Difference between direct and indirect effects

  • Applied example to make this all more concrete

    • Replicating Barnes and Holman (2020)
  • Moving from causal question to causal theory

References

Gill, Jeff, and Simon Heuberger. 2020. “Bayesian Modeling and Inference: A Post-Modern Perspective.” In The SAGE Handbook of Research Methods in Political Science and International Relations, eds. Luigi Curini and Robert Franzese. London, UK: SAGE, 961–84.
Lundberg, Ian, Rebecca Johnson, and Brandon M. Stewart. 2021. What Is Your Estimand? Defining the Target Quantity Connects Statistical Evidence to Theory.” American Sociological Review 86(3): 532–65.
Rubin, Donald B. 1974. Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies.” Journal of Educational Psychology 66(5): 688–701.
———. 2005. Causal Inference Using Potential Outcomes.” Journal of the American Statistical Association 100(469): 322–31.
Western, Bruce, and Simon Jackman. 1994. “Bayesian Inference for Comparative Research.” American Political Science Review 88: 412–23.